Method and device for the generation or decoding of a scalable data stream with provision for a bit-store, encoder and scalable encoder

ABSTRACT

In a method for generating a scalable data stream, when a block of output data of a first encoder is present, this block of output data is written into the scalable data stream. If output data of a second encoder is present for a preceding period of time, this output data for the preceding section is written in transmission direction behind the block of output data of the first encoder into the data stream. When the output data of the scalable encoder for the current section is present, the output data of the second encoder is written into the bit stream subsequent to the output data of the first encoder. A determining data block is generated and written into the bit stream delayed by a period of time which corresponds to the size of the bit savings bank of the second encoder. Finally, buffer information is written into the bit stream, which indicates, where the beginning of the output data of the second encoder for the current section regarding the determining data block is, wherein the buffer information corresponds to the bit savings bank level. Thus, it is possible to simply signalize a bit savings bank in a scalable data stream. The maximum size of the bit savings bank may further be adjusted depending on the intended decoder delay and be communicated to a decoder by positioning the determining data block in the scalable data stream without an effort of additional bits in order to reduce the initial delay of the decoder.

SUMMARY OF THE INVENTION

[0001] The present invention relates to scalable encoders and decodersand in particular to the generation of scalable data streams.

BACKGROUND OF THE INVENTION AND PRIOR ART

[0002] Scalable encoders are shown in EP 0 846 375 B1. In general,scalability is understood as the possibility of decoding a partialsection of a bit stream representing an encoded data signal, e.g. anaudio signal or a video signal into a useful signal. This property isparticularly desirable when e.g. a data transmission channel fails toprovide the complete bandwidth necessary for transmitting a complete bitstream. On the other hand, an incomplete decoding is possible on adecoder with reduced complexity. Generally, different discretescalability layers are defined in practice.

[0003] An example of a scalable encoder as defined in Subpart 4 (GeneralAudio) of Part 3 (Audio) of the MPEG-4 Standard (ISO/IEC 14496-3; 1999Subpart 4) is shown in FIG. 1. An audio signal s(t) to be encoded is fedinto the scalable encoder on the input side. The scalable encoder shownin FIG. 1 contains a first encoder 12, which is an MPEG Celp encoder.The second encoder 14 is an AAC encoder, which provides high-qualityaudio encoding and is defined in the Standard MPEG-2 AAC (ISO/IEC13818). The Celp encoder 12 provides a first scaling layer via an outputline 16, while the AAC encoder 14 provides a second scaling layer via asecond output line 18, to a bit stream multiplexer (BitMux) 20. On theoutput side the bit stream multiplexer then outputs an MPEG-4-LATM bitstream 22 (LATM=Low-Overhead MPEG-4 Audio Transport Multiplex). The LATMformat is described in Section 6.5 of Part 3 (Audio) of the firstsupplement to the MPEG-4 Standard (ISO/IEC 14496-3:1999/AMD1:2000).

[0004] The scalable audio encoder further includes some furtherelements. First, there exists a delay stage 24 in the AAC branch and adelay stage 26 in the Celp branch. With both delay stages it is possibleto set an optional delay for the respective branch. A downsampling stage28 is downstream of the delay stage 26 of the Celp branch to adjust thesampling rate of the input signal s(t) to the sampling rate requested bythe Celp encoder. An inverse Celp decoder 30 is downstream to the Celpencoder 12, wherein the Celp encoded/decoded signal is then supplied toan upsampling stage 32. The upsampled signal is then supplied to afurther delay stage 34, which is termed “Core Coder Delay” in the MPEG-4Standard.

[0005] The stage CoreCoderDelay 34 has the following function. If thedelay is set to zero, the first encoder 14 and the second encoder 12process exactly the same samples of the audio input signal in aso-called superframe. A superframe might e.g. consist of three AACframes, which together represent a certain number of samples No. x toNo. y of the audio signal. The superframe further includes e.g. 8 CELPblocks, which represent the same number of samples and also the samesamples No. x to No. y if CoreCoderDelay=0.

[0006] If, however, a CoreCoderDelay D is set as a time value other thanzero, the three blocks of AAC frames nevertheless represent the samesamples No. x to No. y. The eight blocks of CELP frames, in contrast,represent the samples No. x−Fs D to No. y−Fs D, wherein Fs is thesampling frequency of the input signal.

[0007] The current time sections of the input signal in a superframe forthe AAC blocks and the CELP blocks can thus be either identical, whenCoreCoderDelay D=0, or be shifted relative to each other byCoreCoderDelay, when D is not equal to zero. For the followingimplementations, however, it will be assumed, on the grounds ofsimplicity and without restriction of generality, that CoreCoderDelay=0,so that the current time section of the input signal for the firstencoder and the current time section for the second encoder areidentical. In general, however, the only requirement for a superframeis, that the AAC block(s) and the CELP block(s) in a superframerepresent the same number of samples, wherein it is not necessary forthe samples themselves to be identical to one another, but they may alsobe shifted relative to each other by CoreCoderDelay.

[0008] It should be noted that the Celp encoder, depending on theconfiguration, may process a section of the input signal s(t) fasterthan the AAC encoder 14. In the AAC branch a block decision stage 26 isdownstream to the optional delay stage 24 which establishes among otherthings whether short or long windows should be used for windowing theinput signal s(t), wherein short windows must be chosen for stronglytransient signals, while long windows are preferred for less transientsignals since the relationship between the amount of payload data andpage information is better than for short windows.

[0009] By the block decision stage 26 a fixed delay by e. g. ⅝ times ablock is performed in the present example. This is referred to as alook-ahead function in the art. The block decision stage must alreadylook ahead a certain time to be able to determine whether there aretransient signals in future that must be encoded with short windows.After that the corresponding signal in the Celp branch as well as thesignal in the AAC branch are fed to means for converting thetime-related illustration to a spectral illustration, which isdesignated as MDCT 36 or 38, respectively, in FIG. 1 (MDCT=modifieddiscrete cosine transform). The output signals of the MDCT blocks 36, 38are then supplied to a subtracter 40.

[0010] At this point, samples belonging together regarding time must bepresent, i.e. the delay must be identical in both branches.

[0011] The following block 44 determines whether it is more favorable tosupply the input signal itself to the AAC encoder 14. This is enabledvia the bypass branch 42. If it is determined, however, that thedifferential signal at the output of the subtracter 40 is smallerregarding energy than the signal output by the MDCT block 38, then notthe original signal but the differential signal is taken to be encodedby the AAC encoder 14 to finally form the second scaling layer 18. Thiscomparison may be performed band by band, which is indicated byfrequency-selective switching means (FSS) 44. The exact functions of theindividual elements are known in the art and are described for examplein the MPEG-4 standard as well as in further MPEG standards.

[0012] One main feature in the MPEG-4 standard and in other encoderstandards, respectively, is that the transmission of the compressed datasignal is to be performed with a constant bit rate via a channel. Allhigh-quality audio codecs operate based on blocks, i.e. they processblocks of audio data (order 480-1024 samples) to pieces of a compressedbit stream, which are also referred to as frames. The bit stream formatmust here be set up so that a decoder without a priory information wherea frame starts is able to recognize the beginning of a frame in order tostart the output of decoded audio signal data with a lowest possibledelay. Thus, each header or determining data block of a frame startswith a certain synchronization word which may be searched for in acontinuous bit stream. Further common components within the data streamapart from the determining data block are the main data or “payloaddata” of the individual layers in which the actual compressed audio datais contained.

[0013]FIG. 4 shows a bit stream format with a fixed frame length. Inthis bit stream format the headers or determining data blocks areinserted equidistantly into the bit stream. The side informationassociated with this header and the main data follow immediatelyafterwards. The length, i.e. the number of bits, for the main data isthe same in each frame. Such a bit stream format as it is shown in FIG.4 is for example used in the MPEG layer 2 or the MPEG-CELP.

[0014]FIG. 5 shows another bit stream format with a fixed frame lengthand a backpointer. In this bit stream format the header and the sideinformation are arranged equidistantly as in the format illustrated inFIG. 4. The start of the associated main data is, however, onlyperformed exceptionally directly following a header. In most cases thestart is in one of the preceding frames. The number of bits by which thestart of the main data is shifted in the bit stream is transferred bythe page information variable backpointer. The end of these main datamay lie within this frame or within a preceding frame. The length of themain data is therefore not constant any more. Therefore, the number ofbits with which a block is encoded may be adjusted to thecharacteristics of the signal. Simultaneously, a constant bit rate maybe achieved, however. This technology is called “bit savings bank” andincreases the theoretical delay within the transmission chain. Such abit stream format is for example used in the MPEG layer 3 (MP3).

[0015] The technology of the bit savings bank is further described inthe standard MPEG layer 3.

[0016] Generally, the bit savings bank represents a buffer of bits whichmay be used to provide more bits for encoding a block of time sample asis actually allowed by the constant output data rate. The technology ofthe bit savings bank takes into account that some blocks of audiosamples may be encoded with less bits than predetermined by the constanttransmission rate, so that through these blocks the bit savings bank isfilled, while again other blocks of audio samples comprisepsychoacoustic characteristics which do not allow such a highcompression so that for these blocks the available bits would actuallynot be enough for a low-interference or interference-free encoding,respectively. The additional bits needed are taken from the bit savingsbank so that the bit savings bank is emptied with such blocks.

[0017] Such an audio signal may, however, be also transmitted by aformat with a variable frame length, as it is shown in FIG. 6. With thebit stream format “variable frame length”, as it is illustrated in FIG.6, the fixed sequence of the bit stream elements header, pageinformation and main data is maintained, as with the “fixed framelength”. As the length of the main data is not constant, the bit savingsbank technology may also be used here, there are, however, nobackpointers needed as in FIG. 5. One example for a bit stream format,as it is illustrated in FIG. 6, is the transport format ADTS (audio datatransport stream), as it is defined in the standard MPEG 2 AAC.

[0018] It is to be noted that the above-mentioned encoders are noscalable encoders but include only one single audio encoder.

[0019] In MPEG 4 the combination of different encoder/decoders to ascalable encoder/decoder is provided. It is therefore possible andsensible to combine one CELP voice encoder as the first encoder with anAAC encoder for the further scaling layer(s) and pack the same into onebit stream. The purpose of this combination is that the possibilityremains open either to decode all scaling layers and therefore reach abest possible audio quality, or parts of the same, maybe even only thefirst scaling layer, with the correspondingly restricted audio quality.Reasons for only decoding the lowest scaling layer may be that due to abandwidth of the transmission channel which is too small, the decoderonly received the first scaling layer of the bit stream. Because of thisthe parts of the first scaling layer in the bit stream are favored overthe second and the further scaling layers in the transmission, wherebythe transmission of the first scaling layer is guaranteed with capacitybottlenecks in the transmission network, while the second scaling layermay be lost completely or in part.

[0020] A further reason may be that a decoder wants to achieve a lowestpossible codec delay and therefore decodes only the first scaling layer.It is to be noted that the codec delay of a Celp code is generallysignificantly smaller than the delay of the AAC code.

[0021] In MPEG 4 version 2 the transport format LATM is standardized,which may among other things also transmit scalable data streams.

[0022] In the following, reference is made to FIG. 2a. FIG. 2a is aschematical illustration of the samples of the input signal s(t). Theinput signal may be divided into different successive sections 0, 1, 2,3, wherein each section comprises a certain fixed number of timesamples. Usually, the AAC encoder 14 (FIG. 1) processes a whole section0, 1, 2 or 3 in order to provide an encoded data signal for thissection. The CELP encoder 12 (FIG. 1), however, processes usually asmaller amount of time samples per encoding step. Thus, it is shown asan example in FIG. 2b, that the CELP encoder or generally speaking thefirst encoder or encoder 1 comprises a block length which is one fourthof the block length of the second encoder. It is to be noted that thisdivision is completely random. The block length of the first encoder mayalso be half as long, might, however, also be one eleventh of the blocklength of the second encoder. Thus, the first encoder will generate fourblocks (11, 12, 13, 14) from the section of the input signal, from whichthe second encoder provides one block of data. In FIG. 2c a common LATMbit stream format is shown.

[0023] One superframe may comprise several ratios of number of AACframes to number of CELP frames, as it is illustrated in tabular form inMPEG 4. Thus, a superframe may for example comprise one AAC block and 1to 12 CELP blocks, 3 AAC blocks and 8 CELP blocks but also e.g. forexample more AAC blocks than CELP blocks, depending on theconfiguration. An LATM frame which comprises an LATM determining datablock includes a superframe or also several superframes.

[0024] The generation of the LATM frame opened by the header 1 isdescribed as an example. First, the output data blocks 11, 12, 13, 14 ofthe Celp encoder 12 (FIG. 1) are generated and buffered. In parallel,the output data block of the AAC encoder designated with “1” in FIG. 2cis generated. Then, when the output data block of the AAC encoder hasbeen generated, first of all the determining data block (header 1) iswritten. Depending on the convention, the output data block of the firstencoder which was generated first, designated with 11 in FIG. 2c, may bewritten, i.e. transmitted, directly following header 1. Usually(regarding the few necessary signalizing information) an equidistantdistance of the output data blocks of the first encoder is selected fora further writing and/or transmitting of the data stream, as it isillustrated in FIG. 2c. This means, that after writing and/ortransmitting block 11 the second output data block 12 of the firstencoder, then the third output data block 13 of the first encoder andthen the fourth output data block 14 of the first encoder are writtenand/or transmitted in equidistant distances. The output data block 1 ofthe second encoder is filled into the remaining gaps during thetransmission. Then, an LATM frame is fully written, i.e. fullytransmitted.

[0025] One disadvantage of the bit stream formats illustrated in FIG. 4to 6 is the fact that they are only known for simple encoders, not,however, for scalable encoders and in particular not for scalableencoders having a bit savings bank function.

[0026] As it is known, the bit savings bank is used so that the variableoutput data rate which a psychoacoustic encoder generates inherently maybe adjusted to a constant output data rate. In other words, the numberof bits an audio encoder needs depends on the signal characteristics. Ifthe signal is comprised such that it may be quantized in relativelycoarse way, then a relatively low amount of bits is needed for encodingthis signal. If the signal is, however, comprised such that it has to bequantized very finely, a relatively low amount of bits is needed forencoding this signal. If the signal is, however, comprised such that itneeds to be quantized very finely in order not to introduce audibleinterferences, then a larger amount of bits is needed for encoding thissignal.

[0027] In order to achieve a constant output data rate, a medium amountof bits is determined for one section of a signal to be encoded. If theactually needed amount of bits for encoding a section is smaller thanthe determined number of bits, then the bits which are not needed may beplaced into the bit savings bank. Thus, the bit savings bank is filled.If, however, a section of a signal to be encoded is comprised such thata larger number than the determined number of bits is needed forencoding in order not to introduce audible interferences into thesignal, then the additionally needed bits may be taken from the bitsavings bank. That way, the bit savings bank is emptied. Thereby it maybe guaranteed that a constant output data rate is maintained and at thesame time no audible interferences are introduced into the audio signal.A precondition for this is that the bit savings bank is selected to besufficiently large.

[0028] In the standard MPEG AAC (13818-7:1997) a bit savings bank isreferred to as “bit reservoir”. The maximum size of the bit savings bankfor channels with a constant data rate may be calculated by subtractingthe average amount of bits per block from the maximum decoder inputbuffer size. Its value is usually firmly preset to a value of 10,240bits according to the standard MPEG AAC with a transmission rate of 96kBit/s for a stereo signal with a sampling rate of 48 kHz. The maximumvalue of the bit savings bank, i.e. the size of the bit savings bank issized so that also under bad conditions, i.e. also when the signalcomprises many sections which may not be encoded with the determinednumber of bits, audible interferences need to be introduced into theaudio signal in order to maintain the constant output data rate. This isonly possible when the bit savings bank is sized sufficiently large sothat it is emptied at no time.

[0029] On the decoder side this has the following consequence. After thedecoder has to consider that both the case of a full bit savings bankand the case of an empty bit savings bank may occur in the course ofdecoding an audio signal, the decoder needs to buffer a number of bitscorresponding to the size of the bit savings bank before it startsdecoding at all. Thereby it is guaranteed that the decoder does not runout of bits during decoding the audio signal. If a decoder wouldimmediately decode a signal encoded with the bit savings bank functionwhen it has received the same, then the bits for the output wouldalready run out when the first block to be decoded by accident needed asmaller number than the determined number for encoding, i.e. when thebit savings bank was filled up by the first block. In other words, thebit savings bank function inevitably leads to a delay within thedecoder, wherein this delay corresponds to the size of the bit savingsbank.

[0030] For the preceding example the size of the bit savings bank is10,240 bits. This leads to an inherent initial delay due to the bitsavings bank of about 0.1 s. The delay gets larger, the larger themaximum size of the bit savings bank is selected and the smaller thetransmission rate is selected.

[0031] If, for example, real-time transmissions of a telephone call areconsidered, in which a continuous change of speakers takes place, thenalready due to the bit savings bank a delay of the mentioned size occurswith each change of speaker. Such a delay is extraordinarily disturbingfor both communication partners and typically leads to the fact that onespeaker, because he does not immediately hear a reaction of the otherspeaker, that the one speaker repeats the question again, whichcontributes to a further confusion. Therefore, it is determined that aproduct designed this way is not suitable for real-time applications andwould not have a chance of a breakthrough in the market, respectively.

SUMMARY OF THE INVENTION

[0032] It is the object of the present invention to provide an encodercomprising a bit savings bank function through which a smallertransmission delay may be achieved, to provide a method and a device forgenerating a scalable data stream in which a bit savings bank functionmay be signalized, and to provide a method and a device for decoding ascalable data stream in which a bit savings bank function is signalized.

[0033] In accordance with a first aspect of the invention, this objectis achieved by a method for generating a scalable data stream from atleast one block of output data of a first encoder and at least one blockof output data of a second encoder, wherein the second encoder includesa bit savings bank which is defined by a maximum size and the currentlevel, wherein the at least one block of output data of the firstencoder illustrates a number of samples of the input signal in the firstencoder, wherein the number of samples defines a current section of theinput signal for the first encoder, and wherein the at least one blockof output data of the second encoder illustrates a number of samples ofthe input signal in the second encoder, wherein the number of samplesillustrates a current section of the input signal for the secondencoder, wherein the number of samples for the first encoder and thenumber of samples for the second encoder are equal and wherein thecurrent sections for the first and the second encoder are identical orshifted in relation to each other by an adjustable period of time,comprising: when a block of output data of the first encoder is present,writing the at least one block of output data of the first encoder intothe scalable data stream; when output data of the second encoder for apreceding section of the input signal for the second encoder is present,writing the output data of the second encoder for the preceding sectionof the input signal for the second encoder in the transmission directionbehind a block of output data of the first encoder; when output data ofthe second encoder for the current section of the second encoder ispresent, writing the output data of the second encoder in thetransmission direction behind the output data of the second encoder fora preceding section of the input signal for the second encoder into thebit stream; generating a determining data block, when the block ofoutput data of the second encoder for the current section of the secondencoder is ready, and writing the determining data block delayed by aperiod of time with regard to the generation of the determining datablock, wherein the period of time is smaller or equal to a delay whichcorresponds to the maximum size of the bit savings bank of the secondencoder; and writing buffer information into the bit stream whichindicates where the beginning of the output data of the second encoderfor the current section of the input signal for the second encoder iswith regard to the determining data block.

[0034] In accordance with a second aspect of the invention, this objectis achieved by an encoder comprising a bit savings bank, wherein the bitsavings bank comprises a maximum size, comprising: means for adjustingthe maximum size of the bit savings bank depending on a delay providedfor an audio decoder; and means for transmitting the adjusted maximumsize of the bit savings bank in an output-side data stream.

[0035] In accordance with a third aspect of the invention, this objectis achieved by a scalable encoder, comprising: a first encoder forgenerating a block of output data for the first encoder; a secondencoder comprising a bit savings bank, wherein the bit savings bankcomprises a maximum size for generating a block of output data for thesecond encoder, wherein the second encoder further comprises means foradjusting the maximum size of the bit savings bank depending on aninitial delay provided for an audio decoder; a bit stream multiplexerfor generating a scalable data stream, wherein the bit streammultiplexer is implemented to write the block of output data for thefirst encoder into a scalable data stream, write the block of outputdata for the second encoder into the scalable data stream; generate adetermining data block after the block of output data of the secondencoder has been output by the second encoder, write the determiningdata block into the scalable data stream delayed by a period of time,wherein the period of time corresponds the maximum size of the bitsavings bank, and write buffer information into the bit stream whichindicates how far the beginning of the output data of the second encoderlies before the determining data block in the transmission direction,wherein the buffer information corresponds to a current level of the bitsavings bank.

[0036] In accordance with a fourth aspect of the invention, this objectis achieved by a device for generating a scalable data stream from atleast one block of output data of a first encoder and at least one blockof output data of a second encoder, wherein the second encoder includesa bit savings bank which is defined by a maximum size and a currentlevel, wherein the at least one block of output data of the firstencoder illustrates a number of samples of the input signal into thefirst encoder, wherein the number of samples defines a current sectionof the input signal for the first encoder and wherein the at least oneblock of output data of the second encoder illustrates a number ofsamples of the input signal into the second encoder, wherein the numberof samples illustrates a current section of the input signal for thesecond encoder, wherein the number of samples for the first encoder andthe number of samples for the second encoder are equal and wherein thecurrent sections for the first and the second encoder are identical orare shifted in relation to each other by an adjustable period of time,comprising: means for writing a block of output data of the firstencoder into the scalable data stream, when a block of output data ofthe first encoder is present; means for writing output data of thesecond encoder for a preceding section of the input signal for thesecond encoder in transmission direction behind a block of output dataof the first encoder when the output data of the second encoder for thepreceding section of the input signal are present for the secondencoder; means for writing output data of the second encoder for thecurrent section of the time signal for the second encoder intransmission direction behind the output data of the second encoder fora preceding section of the input signal for the second encoder into thebit stream when the output data of the second encoder is present for thecurrent section of the second encoder; means for generating adetermining data block when the block of output data of the secondencoder is present for the current section of the second encoder, andfor writing the determining data block delayed by a period of time withregard to the generation of the determining data block, wherein theperiod of time is smaller or equal to a delay which corresponds to themaximum size of the bit savings bank of the second encoder; and meansfor writing buffer information into the bit stream which indicates wherethe beginning of the output data of the second encoder is for thecurrent section of the second encoder with regard to the determiningdata block.

[0037] In accordance with a fifth aspect of the invention, this objectis achieved by a method for decoding a scalable data stream from atleast one block of output data of a first encoder and at least one blockof output data of a second encoder, wherein the second encoder includesa bit savings bank which is defined by a maximum size and a currentlevel, wherein the at least one block of output data of the firstencoder illustrates a number of samples of the input signal into thefirst encoder, wherein the number of samples define a current section ofthe input signal for the first decoder and wherein the at least oneblock of output data of the second encoder illustrates a number ofsamples of the input signal into the second encoder, wherein the numberof samples illustrates a current section of the input signal for thesecond encoder, wherein the number of samples for the first encoder andthe number of samples for the second encoder are equal, and wherein thecurrent sections for the first and the second encoder are identical orshifted in relation to each other by an adjustable period of time,wherein the scalable data stream comprises output data of the firstencoder, output data of the second encoder for a preceding section,output data of the second encoder for the current section, a determiningdata block and buffer information, comprising: buffering the scalabledata stream; reading the block of output data of the first encoder forthe current section of the first encoder; reading the determining datablock and the buffer information from the buffered data stream;determining the beginning of the block of output data of the secondencoder for the current section of the second encoder using the bufferinformation; and decoding the block of output data of the first encoderand the block of output data of the second encoder if necessaryconsidering the adjustable period of time by which the current sectionof the first encoder and the current section of the second encoder aretime-shifted in relation to each other.

[0038] In accordance with a sixth aspect of the invention, this objectis achieved by a device for decoding a scalable data stream from atleast one block of output data of a first encoder and at least one blockof output data of a second encoder, wherein the second encoder includesa bit savings bank which is defined by a maximum size and a currentlevel, wherein the at least one block of output data of the firstencoder illustrates a number of samples of the input signal into thefirst encoder, wherein the number of samples define a current section ofthe input signal for the first encoder and wherein the at least oneblock of output data of the second encoder illustrates a number ofsamples of the input signal into the second encoder, wherein the numberof samples illustrate a current section of the input signal for thesecond encoder, wherein the number of samples for the first encoder andthe number of samples for the second encoder are equal and wherein thecurrent sections for the first and the second encoder are identical orshifted in relation to each other by an adjustable period of time,wherein the scalable data stream comprises output data of the firstencoder, output data of the second encoder for a preceding section,output data of the second encoder for a current section, a determiningdata block and buffer information, comprising: means for buffering thescalable data stream; means for reading the block of output data of thefirst encoder for the current section of the first encoder; means forreading the determining data block and the buffer information from thebuffered data stream; means for determining the beginning of the blockof output data of the second encoder for the current section of thesecond encoder using the buffer information; and means for decoding theblock of output data of the first encoder and the block of output dataof the second encoder if necessary considering the adjustable period oftime by which the current section of the first encoder and the currentsection of the second encoder are time-shifted to each other.

[0039] The present invention is based on the findings that the presentconcept of the fixed set bit savings bank size must be discarded inorder to achieve a reduced-delay decoding. According to the invention,this is achieved by making the maximum size of the bit savings bank ofan encoder adjustable, wherein depending on the application anddepending on the intended decoder function a certain adjustment of thebit savings bank is achieved. For the case of a one-directional datatransmission only a large bit savings bank may be selected in order tosatisfy highest possible audio quality requirements, while for the caseof a bi-directional communication in which a frequent change oftransmitter and receiver and a frequent change of speakers takes place,respectively, a smaller bit savings bank size is to be adjusted. So thatthe decoder may profit from a smaller bit savings bank size adjustment,the bit savings bank size must be transmitted to the decoder in someway. This may on the one hand be achieved by the transmission ofadditional information in the data stream, it may however also beperformed implicitly without the transmission of additional sideinformation and signalizing information, respectively, as it isillustrated in particular with reference to the scalable case.

[0040] One advantage of the present invention is that now directinfluence may be taken on the decoder delay via the adjustment of themaximum size of the bit savings bank. If the maximum size of the bitsavings bank is selected smaller, then the decoder may also insert asmaller delay before it starts decoding without risking the danger thatit may run out of output data during decoding which needs to beprevented in any case. The “price” which has to be paid for this is thatone or the other section of the audio signal was not encoded with 100%of the audio quality, as the bit savings bank was empty and noadditional bits were available any more. Usually, an audio encoderreacts in this case by violating the psychoacoustic masking thresholdwhen quantizing and, in order to make do with the available number ofbits, selects a coarser quantization as is really needed. The mainadvantage of the smaller delay of the decoder is, however, guaranteed.The reduction of the size of the bit savings bank in order to reach asmaller delay also on the decoder side is therefore achieved with alower audio quality, wherein this lower audio quality only occurs nowand then in the audio signal, and when the audio signal is simple todecode it may not occur at all. As a result, the inflexibility regardingthe bit savings bank according to the prior art is overcome, which maybe over-dimensioned for many applications in order to encode allpossible cases with a high audio quality, so that a use of encoders fora bi-directional communication with frequently changing speakers becomespossible which was not conceivable up to now due to the large fixedlyadjusted bit savings bank.

[0041] The inventive variability of the bit savings bank and theaccompanying variability of the delay on the decoder side is especiallyof an advantage in the case of a scalable audio encoder, as now alsohere a reduced-delay decoding may not only be achieved of the firstlowest scaling layers but also a reduced-delay decoding of higherscaling layers which are for example generated by an AAC encoder may beachieved. In particular in the scalable case only one scaling layer isinfluenced by the variable adjustment of the bit savings bank, while theother scaling layer(s) remain unaffected. It is thus possible to actupon individual scaling layers deliberately without causing any changesin the other scaling layers.

[0042] As it was already discussed it is necessary to communicate thefreely selectable and the freely selected bit savings bank size,respectively, to the decoder. This was not necessary in the prior art,as a fixed bit savings bank size was always agreed upon, so that adecoder introduced the corresponding delay for example by dimensioningits input buffer knowing the bit savings bank size which was firmlyagreed on.

[0043] In particular for scalable encoders and scalable data stream anadjustable bit savings bank size without additional side information maybe achieved simply by positioning a determining data block within thescalable data stream. According to the invention, the determining datablock is positioned within the bit stream so that the decoder needs toreceive as many bits for the respective layer as it is determined by theaverage block length when it receives the determining data block.

[0044] After receiving a frame, the decoder may start decoding withoutcalculating or inserting a delay. This is achieved due to the fact thatalready within the scalable data stream the determining data block iswritten in a delayed manner regarding the first and the second scalinglayer, i.e. preferably delayed by a period of time which corresponds tothe adjustment of the bit savings bank. Thereby it is achieved that theencoder may select any bit savings bank size depending on therequirement and that the selected bit savings bank size simplyimplicitly signalizes to the decoder, for it to enter the determiningdata block in the bit stream in a delayed manner with regard to thepayload data.

[0045] In other words, the consequence is that the determining datablock is not written at the first possible point of time anymore, i.e.delay-optimized, as in the prior art, but at the latest possible pointof time, without delaying the AAC block. The current level of the bitsavings bank may then be signalized by the so-called backpointer, wherethe data of a preceding section end and where the data of the currentsection begin.

[0046] This is true both for the scalable case in which only output dataof one individual encoder occur in the bit stream, and also for thescalable case, in which data of at least two different encoders occur inthe scalable bit stream. If a superframe, i.e. a section in the bitstream comprising a first number of output data blocks of a firstencoder and a second number of output data blocks of a second encoderwhich relate to the same number of samples of a input signal, comprisesa plurality of blocks of an encoder, then the number of blocks of theone encoder which are associated with a determining data block cansimply be signalized by the fact that offset information is transferredwith the bit stream. The offset information may also be interpreted bythe decoder as backpointer in order to know which data of the bit streamnow belong to a determining data block and therefore correspond to atime section of the input signal if necessary considering the variablecore coder delay.

[0047] One main advantage of this arrangement is that the decoder, whenit receives an inventive data stream, must not calculate and insert adelay, but that the delay was already considered by the positioning ofthe determining data block alone on the encoder side. The decoder cantherefore output a frame immediately after the reception. This alsoprovides the possibility to signalize an adjusted maximum bit savingsbank size in a simple way, i.e. without additional bits. As thesignalization may be performed in a simple and without efforts, i.e. bythe position of the determining data block, it is also possible easilyand in particular without access to the decoder to vary the bit savingsbank size in order to be able to adjust the transmission delay asdesired.

BRIEF DESCRIPTION OF THE DRAWINGS

[0048] In the following, preferred embodiments of the present inventionare explained in more detail referring to the accompanying drawings, inwhich:

[0049]FIG. 1a shows a scalable encoder according to MPEG 4 whichcomprises the present invention;

[0050]FIG. 1b shows a decoder according to the present invention;

[0051]FIG. 2a shows a schematical illustration of an input signal whichis divided into successive time sections;

[0052]FIG. 2b shows a schematical illustration of an input signal whichis divided into successive time sections, wherein the ratio of the blocklength of the first encoder to the block length of the second encoder isillustrated;

[0053]FIG. 2c shows a schematical illustration of a scalable data streamwith a high delay in decoding the first scaling layer;

[0054]FIG. 2d shows a schematical illustration of a scalable data streamwith a low delay in decoding the first scaling layer;

[0055]FIG. 2e shows a schematical illustration of an inventive scalabledata stream wherein the determining data block is delayed with referenceto the payload data;

[0056]FIG. 3 shows a detailed illustration of the inventive scalabledata stream regarding the example of a Celp encoder as the first encoderand an AAC encoder as the second encoder with a bit savings bankfunction;

[0057]FIG. 4 shows an example for a bit stream format with a fixed framelength;

[0058]FIG. 5 shows an example for a bit stream format with a fixed framelength and a backpointer; and

[0059]FIG. 6 shows an example of a bit stream format with a variableframe length.

DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS

[0060] In the following, FIG. 2d is referred to in comparison to FIG. 2cin order to explain a bit stream with a small delay of the first scalinglayer for purposes of comparison. As in FIG. 2c the scalable data streamcontains successive determining data blocks which are referred to asheader 1 and header 2. In the preferred embodiment of the presentinvention which is implemented according to the MPEG 4 standard thedetermining data blocks are LATM headers. Like in the prior art in thetransmission direction from an encoder to a decoder, which isillustrated in FIG. 2d with an arrow 202, behind the LATM header 200 theparts hatched from top right to bottom left of the output data block ofthe AAC encoder are located which are inserted in gaps remaining betweenthe output data blocks of the first encoder.

[0061] In contrast to the prior art, there are not only output datablocks of the first encoder within the frame started by the LATM header200 anymore, which belong to this frame, like for example the outputdata blocks 13 and 14, but also the output data blocks 21 and 22 of thefollowing section of input data. In other words, in the exampleillustrated in FIG. 2d, the two output data blocks of the first encoder,which are designated with 11 and 12, are present in the bit stream inthe transmission direction (arrow 202) before the LATM header 200. Inthe example illustrated in FIG. 2d the offset information 204 indicatean offset of two output data blocks of the output data blocks of thefirst encoder. When FIG. 2d is compared to FIG. 2c it may be seen thatthe decoder may already decode the lowest scaling layer earlier by atime which exactly corresponds to this offset than it is the case inFIG. 2c, if the decoder is only interested in the first scaling layer.The offset information, which may for example be signalized in the formof a “core frame offset”, serve to determine the position of the firstoutput data block 11 in the bit stream.

[0062] For the case of core frame offset=zero, the bit stream indicatedin FIG. 2c results. If, however, core frame offset>zero, then thecorresponding output data block of the first encoder 11 is transmittedearlier by the number of core frame offset at the output data blocks ofthe first encoder. In other words, the delay between the first outputdata block of the first encoder after the LATM header and the first AACframe results from core coder delay (FIG. 1)+core frame offset×coreblock length (block length of encoder 1 in FIG. 2b). As it becomes clearfrom the comparison of FIG. 2c and 2 d, for the case of core frameoffset=zero (FIG. 2c), the output data blocks 11 and 12 of the firstencoder are transmitted after the LATM header 200. By the transmissionof core frame offset=2 the output data blocks 13 and 14 may follow afterthe LATM header 200, whereby the delay with a pure CELP decoding, i.e.the decoding of the first scaling layer, is reduced by two CELP blocklengths. An offset of three blocks would be optimum in the example. Anoffset of one or two blocks brings, however, already a delay advantage.

[0063] Through this bit stream structure it is possible for the Celpencoder to transmit the generated Celp block directly after theencoding. In this case no additional delay is added to the CELP encoderby the bit stream multiplexer (20). Thus, for this case no additionaldelay is added to the Celp delay by the scalable combination, so thatthe delay is at its minimum.

[0064] It is noted that the case illustrated in FIG. 2d is onlyexemplary. This way, different ratios of the block length of the firstencoder to the block length of the second encoder are possible, whichmay for example vary from 1:2 to 1:12, may however also take differentratios.

[0065] In the extreme case this means (1:12 for MPEG 4 AAC/CELP), thatfor the same time section of the input signal for which the AAC encodergenerates an output data block, the Celp encoder generates twelve outputdata blocks. The delay advantage by the data stream illustrated in FIG.2d in contrast to the data stream illustrated in FIG. 2c may in thiscase easily take magnitudes from one fourth up to half a second. Thisadvantage will be increased the greater the ratio between the blocklength of the second encoder and the block length of the first encoderbecomes, wherein in the case of an AAC encoder as the second encoder ablock length as great as possible is aimed at due to the ratio which isthen more favorable from payload information to side information, if theencoding signal admits it.

[0066] In FIG. 2c a scalable data stream according to the LATM format isillustrated in which the data blocks of the first encoder have to bebuffered, i.e. delayed. In the format of FIG. 2 this results from thefact, as it was discussed, that the header may only be written when theoutput data of the second encoder are present, as the header includesinformation about the length and the number of bits, respectively,within the output data block of the second encoder.

[0067] Thus, in FIG. 2d for purposes of illustration an improvement isalready illustrated regarding the fact that the output data blocks ofthe first encoder are already written into the bit stream earlier inorder to reduce the delay when a decoder only wants to decode the lowestscaling layer. Nevertheless, the determining data block is still locatedbefore the output data block of the second encoder, which is designatedwith “1” in FIG. 2d.

[0068] In FIG. 2e now, compared to FIG. 2c, the inventive scalable datastream is illustrated, wherein the determining data block (header 1 200)is not immediately written anymore when it is available, i.e. before theoutput data block of the first encoder which is designated with “11”,but in which the determining data block 200 is written into the datastream delayed by a period of time in relation to the case of FIG. 2c.This period of time equals the maximum size of the bit savings bank (maxbufferfullness 250) in a preferred embodiment of the present invention.Therefore the output data block of the second encoder for the currentsection of the input signal, designated by the determining data block200, starts a number of bits equal to bufferfullness 260 before thedetermining data block in the transmission direction from an encoder toa decoder, whereas it can be seen from FIG. 2c that the AAC data havestarted behind the determining data block.

[0069] From the point of view of the decoder the pointer 260 istherefore a backpointer.

[0070] For the case, that the first encoder provides a larger number ofblocks for a number of samples than the second decoder, wherein in theexample illustrated in FIG. 2e the ratio of four blocks of output dataof the first encoder to a block of output data of the second encoder isonly exemplary for the same number of samples, based on the determiningdata block, as in the case of FIG. 2e, a core frame offset issignalized, so that a decoder knows which blocks of output data of thefirst encoder for example belong to a block of output data of the secondencoder or are related to each other via core coder delay, respectively.

[0071] If now FIG. 2d is compared to FIG. 2e, then it may be seen thatalso in FIG. 2e an offset 204 is present. The offset 204 of FIG. 2dwhich has a value of 2 in FIG. 2d would increase to a value of 5 withregard to the case of FIG. 2e, as the determining data block 200 in FIG.2e compared to FIG. 2d has been shifted backwards by three output datablocks of the first encoder.

[0072] In the following, reference is made to FIG. 1a again. In additionto the scalable encoder already described in the descriptionintroduction, the inventive scalable encoder illustrated in FIG. 1acontains a block bit savings bank control 50 and a control line 52 fromthe AAC encoder 14 to the bit stream multiplexer 20, via which themaximum size of the bit savings bank which was adjusted by the bitsavings bank control 50, may be communicated to the bit streammultiplexer so that the same may perform the bit stream formattingrequired in FIG. 2e.

[0073] In FIG. 1b a schematical block diagram of a scalable decoder maybe found which is complementary to the scalable encoder in FIG. 1a. Thescalable bit stream which is supplied to the encoder via a line 60 isfed into an input buffer/bit stream demultiplexer 62 of the decoder.Here, the bit stream is divided, to extract the required blocks for aCELP decoder 64 and an AAC decoder 66. The inventive decoder furtherincludes an AAC delay stage 68 which serves for introducing a delaycorresponding to the bit savings bank size, so that the AAC decoder 66never runs out of data to put out. According to the invention, this AACdelay stage is now implemented variably, wherein the delay is controlleddepending on the bit savings bank information, which are extracted fromthe bit stream by the bit stream demultiplexer 62 and supplied to theAAC delay stage 68 via a bit savings bank information line 60. Dependingon the bit savings bank level now the delay of the AAC delay stage 68 isadjusted. If a small bit savings bank is adjusted by bit savings bankcontrol means 50 of FIG. 1a, then also the AAC delay stage 68 may beadjusted to a small delay, so that a reduced-delay decoding of thesecond scaling layer may be achieved.

[0074] The scalable decoder of FIG. 1b further includes MDCT means 72 totransform the time domain output signals of the CELP decoder 64 into thefrequency domain, and an upsampling stage upstream to the same. Thespectrum is delayed by the delay stage 74, which compensates timedifferences present between the two branches, so that at means 76 whichare referred to as adder/FSS⁻¹, the same ratios are present. Means 66basically performs the analog function to the subtractor 40 and the FSS44 of FIG. 1a. After block 76 the spectral values are transformed bymeans 78 for performing a back-transformation from the frequency domaininto the time domain, so that at an output 80 either only the secondscaling layer or the first and the second scaling layer are present inthe time domain. At an output 82, however, only the first scaling layeris present in the time domain generated by the CELP decoder 64.

[0075] In the following, reference is made to FIG. 3, which is similarto FIG. 2, illustrates, however, the special implementation referring tothe example of MPEG 4. In the first row again a current time section isshown hatched. In the second row the windowing which is used with theAAC encoder is illustrated schematically. As it is known, anoverlap-and-add of 50% is used so that a window usually comprises doublethe length of time samples than the current time section which isillustrated hatched in the top row of FIG. 3. In FIG. 3 the delay tdipis further illustrated, which corresponds to block 26 of FIG. 1 andcomprises a size of ⅝ of the block length in the selected example.Typically, a block length of the current time section of 960 samples isused so that the delay tdip of ⅝ the block length comprises 600 samples.For example, the AAC encoder provides a bit stream of 24 kbit/s, whilethe CELP encoder schematically illustrated below provides a bit streamcomprising a rate of 8 kbit/s. The overall bit rate is then 32 kbit/s.

[0076] As it may be seen from FIG. 3, the output data blocks zero andone of the CELP encoder correspond to the current time section for thefirst encoder. The output data block comprising the number 2 of the CELPencoder already corresponds to the next time section. The same holdstrue for the CELP block with the number 3. In FIG. 3, the delay of thedownsampling stage 28 and the CELP encoder 12 is further illustrated byan arrow which is designated by the reference numeral 302. From this,the delay designated by core coder delay and illustrated by an arrow 304in FIG. 3 results as the delay which needs to be adjusted by stage 34 sothat at the subtraction location 40 of FIG. 1 equal ratios are present.This delay may alternatively be generated by block 26. For example:

core coder delay=

=tdip−Celp encoder delay−downsampling delay=

=600−120−117=363 samples.

[0077] For the case without a bit savings bank function and for thecase, respectively, that the bit savings bank (bit mux outputbuffer) isfull, which is indicated by the variable bufferfullness=max, the caseindicated in FIG. 2d results. In contrast to FIG. 2d in which fouroutput data blocks of the first encoder are generated corresponding toone output data block of the second encoder, in FIG. 3 two output datablocks of the CELP encoder designated with “0” and “1” are generated foran output data block of the second encoder which is drawn in black inthe two last rows of FIG. 3. According to the invention, now, however,not the output data block of the CELP encoder with the number “0” iswritten behind a first LATM header 306 anymore, but the output datablock of the CELP encoder with the number “one”, as the output datablock with the number “zero” has already been transmitted back to thedecoder. In the equidistant grid distance provided for the CELP datablocks, the CELP block 1 is then followed by the CELP block 2 for thenext time section, wherein then for the completion of a frame the restof the data of the output data block of the AAC encoder is written intothe data stream until a next LATM header 308 for the next time sectionfollows.

[0078] The present invention may simply be combined with the bit savingsbank function, as it is illustrated in the last row of FIG. 3. For thecase, that the variable “bufferfullness” which indicates the filling ofthe bit savings bank, is smaller than the maximum value, this means,that the AAC frame for the directly preceding time section needed morebits than it is actually admissible. This means, that behind the LATMheader 306 the CELP frames are written as before, that however first theat least one output data block of the AAC encoder needs to be writtenfrom one or several preceding time sections in the bit stream before thewriting of the output data block of the AAC encoder for the current timesection may be started. From the comparison of the last two rows of FIG.3 which are designated by “1” and “2” it may be seen that the bitsavings bank function also directly leads to a delay in the encoder forthe AAC frame. So the data for the AAC frame of the current time domain,which is designated by 310 in FIG. 3, is however present at the samepoint of time as in case “1”, can however only be written into the bitstream after the AAC data 312 for the directly preceding time sectionhave been written into the bit stream. Depending on the bit savings banklevel of the AAC encoder therefore the initial position of the AAC frameis shifted. The bit savings bank level is to be transferred in the LATMelement StreamMuxConfig by the variable “bufferfullness”. The variablebufferfullness is calculated from the variable bit reservoir divided bythe 32-fold of the actually present channel number of the audiochannels.

[0079] It is to be noted that the pointer designated by the referencenumeral 314 in FIG. 3, whose length=max bufferfullness−bufferfullness,is a forward-pointer which points to the future as it were, while thepointer illustrated in FIG. 5 is a backpointer which points to the pastas it were. The reason for this is that according to the presentembodiment the LATM header is always written into the bit stream afterthe current time section has been processed by the AAC encoder, althoughAAC data may still have to be written into the bit stream from precedingtime sections.

[0080] It is further noted that the pointer 314 is deliberately drawninterrupted below the Celp block 2 as it does neither consider thelength of the CELP block 2 nor the length of the CELP block 1 as thisdata has of course nothing to do with the bit savings bank of the AACencoder. Further, no header data and bits of possibly present furtherlayers are considered.

[0081] In the decoder first of all an extraction of the CELP frames fromthe bit stream is performed which is easily possible as the same are forexample arranged equidistantly and comprise a fixed length.

[0082] In the LATM header, however, length and distance of all Celpblocks may be signalized so that in every case a direct decoding ispossible.

[0083] Thereby, the parts of the output data of the AAC encoder of thedirectly preceding time section which were so to speak separated by theCELP block 2 are jointed again and the LATM header 306 so to speak movesto the beginning of the pointer 314, so that the decoder knowing thelength of the pointer 314 knows when the data of the directly precedingtime section are over in order to then decode the directly precedingtime section together with the Celp data blocks present for the samewith full audio quality when these data is completely read in.

[0084] In contrast to the case illustrated in FIG. 2c, in which an LATMheader is followed both by the output data blocks of the first encoderas well as the output data block of the second encoder, now on the onehand a shift from the output data blocks of the first encoder forward inthe bit stream may be performed by the variable core frame offset, whileby the arrow 314 (max bufferfullness−bufferfullness) a shift of theoutput data block of the second encoder to the back of the scalable datastream may be achieved, so that the bit savings bank function may beimplemented easily and safely also in the scalable data stream, whilethe basic raster of the bit stream is maintained by the successive LATMdetermining data blocks which are always written when the AAC encoderhas encoded a time section and which therefore may serve as a referencepoint also when a major part of the data in the frame designated by anLATM header originate on the one hand from the next time section(regarding the CELP frames) or, however, from the preceding time section(regarding the AAC frame), as it is illustrated in the last row in FIG.3, wherein the respective shifts are communicated, however, to a decoderby two variables additionally to be transmitted in the bit stream.

[0085] For purposes of illustration the last row of FIG. 3 describes thecase, as it has been discussed, in which the LATM header 306 is writteninto the bit stream immediately after it has been generated, so that theLATM header 306 is followed by output data of the second encoder 312 ofthe preceding time section, wherein the output data of the secondencoder for the current time section which the LATM header 306 refers toonly follow after a distance in the transmission direction behind theLATM header, wherein the distance is given by the difference between maxbufferfullness and bufferfullness, as it is illustrated in FIG. 3.

[0086] In contrast to this, according to the present invention, as it isillustrated referring to FIG. 2e, the LATM header 306 is not writtenanymore when it has been generated but is written delayed by a period oftime which corresponds to max bufferfullness. According to theinvention, the LATM header 306 would therefore stand behind a position330 within the bit stream depending on the value of bufferfullness andthe forward-pointer 314 is replaced by a backward-pointer (260 in FIG.2e).

[0087] According to the invention the arrangement selected in the FIGS.2c and 2 d and also in FIG. 3 is discarded in which a CELP blockimmediately follows the LATM header.

[0088] Instead of that, preferably the following priority distributionis preferred when writing data into the scalable bit stream in order toachieve a reduced-delay decoding of the first scaling layer as well as areduced-delay decoding of the second scaling layer.

[0089] The output data blocks of the first encoder enjoy a highpriority. Always when an output data block of the first encoder iscompletely written, this output data block is written into the bitstream. From this the equidistant raster of output data blocks of thefirst encoder automatically results which further have an equal lengthwhen using a CELP encoder.

[0090] If no output data of the first encoder to be written arecurrently present, output data of the AAC encoder for the preceding timesection of the input signal is written into the bit stream until nocorresponding data is present anymore. Only then the writing of theoutput data of the AAC encoder for the current section is started. Thewriting of this output data into the bit stream is obviously alwaysinterrupted when the output data of the first encoder are availableagain, as it may be seen in FIG. 2e.

[0091] The writing of the output data of the AAC encoder for the currenttime section is further also interrupted when an LATM header is completeand the same has been delayed by max bufferfullness 350 (FIG. 2e). Thescalable bit stream is complete when the corresponding values forbufferfullness 260 and offset 270 have been entered into the bit streameither separately or via the determining data block.

[0092] In the following, reference is made to a decoding of a bit streamgenerated this way. When the decoder is only interested in the firstscaling layer, i.e. the output data blocks of the first encoder (CELPencoder), then it will simply take one CELP block after the other fromthe bit stream and decode the same, without consideration for the LATMheader or the AAC data. As the CELP blocks are preferably written intothe bit stream immediately after their creation, a reduced-delaydecoding of the CELP blocks is guaranteed.

[0093] When the decoder wishes a decoding both of the first as well asthe second scaling layer, i.e. wants to achieve an audio signal with ahigh quality, then he need to achieve the association between the CELPblocks and the several AAC block(s) for a superframe, i.e. for a certainnumber of samples, wherein if necessary a core coder delay (34 of FIG.1a) is to be considered when the current time section of the inputsignal of the AAC encoder regarding a superframe is shifted from thecurrent time section of the CELP encoder.

[0094] This is performed by the decoder buffering the bit stream untilit hits an LATM header, e.g. the header 200 of FIG. 2e. Knowing theoffset 270, the decoder may then determine which output data blocks ofthe first encoder belong to the LATM header 200. Considering thevariable bufferfullness the decoder further knows where in the datastored in the decoder input buffer the AAC frame of the time sectionbegins that the LATM header refers to. In the case of bufferfullnessequal max already the whole interesting AAC frame is contained in thedecoder input buffer. In the case of bufferfullness equal 0, theinteresting AAC frame begins immediately behind the LATM header, so thatthe decoder may begin to decode without delay using the data alreadystored in the input buffer or also using a part of the data stored inthe input buffer and using a directly arriving part of data which standsbehind the LATM header in the transmission direction. The bit savingsbank size is therefore signalized only implicitly by the position of thedetermining data block with reference to the payload data in the bitstream, without any side information being required. In this case alsothe stage with a variable delay in the decoder (block 68 of FIG. 1b) andthe line 70 of FIG. 1b are disposed of.

What is claimed is:
 1. Method for generating a scalable data stream fromat least one block of output data of a first encoder and at least oneblock of output data of a second encoder, wherein the second encoderincludes a bit savings bank which is defined by a maximum size and thecurrent level, wherein the at least one block of output data of thefirst encoder illustrates a number of samples of the input signal in thefirst encoder, wherein the number of samples defines a current sectionof the input signal for the first encoder, and wherein the at least oneblock of output data of the second encoder illustrates a number ofsamples of the input signal in the second encoder, wherein the number ofsamples illustrates a current section of the input signal for the secondencoder, wherein the number of samples for the first encoder and thenumber of samples for the second encoder are equal and wherein thecurrent sections for the first and the second encoder are identical orshifted in relation to each other by an adjustable period of time,comprising: when a block of output data of the first encoder is present,writing the at least one block of output data of the first encoder intothe scalable data stream; when output data of the second encoder for apreceding section of the input signal for the second encoder is present,writing the output data of the second encoder for the preceding sectionof the input signal for the second encoder in the transmission directionbehind a block of output data of the first encoder; when output data ofthe second encoder for the current section of the second encoder ispresent, writing the output data of the second encoder in thetransmission direction behind the output data of the second encoder fora preceding section of the input signal for the second encoder into thebit stream; generating a determining data block, when the block ofoutput data of the second encoder for the current section of the secondencoder is ready, and writing the determining data block delayed by aperiod of time with regard to the generation of the determining datablock, wherein the period of time is smaller or equal to a delay whichcorresponds to the maximum size of the bit savings bank of the secondencoder; and writing buffer information into the bit stream whichindicates where the beginning of the output data of the second encoderfor the current section of the input signal for the second encoder iswith regard to the determining data block.
 2. Method according to claim1, wherein the period of time is equal to a delay which corresponds tothe maximum size of the bit savings bank, and wherein the bufferinformation corresponds to the current level of the bit savings bank forthe current section of the input signal for the second encoder. 3.Method according to claim 1, wherein the determining data block iswritten with a high priority, wherein the blocks of output data of thefirst encoder are written with a lower priority, and wherein the atleast one block of output data of the second encoder for a precedingsection of the input signal is written with a higher priority into thebit stream than the at least one block of output data of the secondencoder for the current section.
 4. Method according to claim 1, whereinthe first encoder provides at least two blocks for a number of samples,wherein the method further comprises: writing offset information intothe bit stream, which indicates, how many blocks of output data of thefirst encoder in transmission direction before the determining datablock belong to the current section of the first encoder.
 5. Encodercomprising a bit savings bank, wherein the bit savings bank comprises amaximum size, comprising: means for adjusting the maximum size of thebit savings bank depending on a delay provided for an audio decoder; andmeans for transmitting the adjusted maximum size of the bit savings bankin an output-side data stream.
 6. Scalable encoder, comprising: a firstencoder for generating a block of output data for the first encoder; asecond encoder comprising a bit savings bank, wherein the bit savingsbank comprises a maximum size for generating a block of output data forthe second encoder, wherein the second encoder further comprises meansfor adjusting the maximum size of the bit savings bank depending on aninitial delay provided for an audio decoder; a bit stream multiplexerfor generating a scalable data stream, wherein the bit streammultiplexer is implemented to write the block of output data for thefirst encoder into a scalable data stream, write the block of outputdata for the second encoder into the scalable data stream; generate adetermining data block after the block of output data of the secondencoder has been output by the second encoder, write the determiningdata block into the scalable data stream delayed by a period of time,wherein the period of time corresponds the maximum size of the bitsavings bank, and write buffer information into the bit stream whichindicates how far the beginning of the output data of the second encoderlies before the determining data block in the transmission direction,wherein the buffer information corresponds to a current level of the bitsavings bank.
 7. Device for generating a scalable data stream from atleast one block of output data of a first encoder and at least one blockof output data of a second encoder, wherein the second encoder includesa bit savings bank which is defined by a maximum size and a currentlevel, wherein the at least one block of output data of the firstencoder illustrates a number of samples of the input signal into thefirst encoder, wherein the number of samples defines a current sectionof the input signal for the first encoder and wherein the at least oneblock of output data of the second encoder illustrates a number ofsamples of the input signal into the second encoder, wherein the numberof samples illustrates a current section of the input signal for thesecond encoder, wherein the number of samples for the first encoder andthe number of samples for the second encoder are equal and wherein thecurrent sections for the first and the second encoder are identical orare shifted in relation to each other by an adjustable period of time,comprising: means for writing a block of output data of the firstencoder into the scalable data stream, when a block of output data ofthe first encoder is present; means for writing output data of thesecond encoder for a preceding section of the input signal for thesecond encoder in transmission direction behind a block of output dataof the first encoder when the output data of the second encoder for thepreceding section of the input signal are present for the secondencoder; means for writing output data of the second encoder for thecurrent section of the time signal for the second encoder intransmission direction behind the output data of the second encoder fora preceding section of the input signal for the second encoder into thebit stream when the output data of the second encoder is present for thecurrent section of the second encoder; means for generating adetermining data block when the block of output data of the secondencoder is present for the current section of the second encoder, andfor writing the determining data block delayed by a period of time withregard to the generation of the determining data block, wherein theperiod of time is smaller or equal to a delay which corresponds to themaximum size of the bit savings bank of the second encoder; and meansfor writing buffer information into the bit stream which indicates wherethe beginning of the output data of the second encoder is for thecurrent section of the second encoder with regard to the determiningdata block.
 8. Method for decoding a scalable data stream from at leastone block of output data of a first encoder and at least one block ofoutput data of a second encoder, wherein the second encoder includes abit savings bank which is defined by a maximum size and a current level,wherein the at least one block of output data of the first encoderillustrates a number of samples of the input signal into the firstencoder, wherein the number of samples define a current section of theinput signal for the first decoder and wherein the at least one block ofoutput data of the second encoder illustrates a number of samples of theinput signal into the second encoder, wherein the number of samplesillustrates a current section of the input signal for the secondencoder, wherein the number of samples for the first encoder and thenumber of samples for the second encoder are equal, and wherein thecurrent sections for the first and the second encoder are identical orshifted in relation to each other by an adjustable period of time,wherein the scalable data stream comprises output data of the firstencoder, output data of the second encoder for a preceding section,output data of the second encoder for the current section, a determiningdata block and buffer information, comprising: buffering the scalabledata stream; reading the block of output data of the first encoder forthe current section of the first encoder; reading the determining datablock and the buffer information from the buffered data stream;determining the beginning of the block of output data of the secondencoder for the current section of the second encoder using the bufferinformation; and decoding the block of output data of the first encoderand the block of output data of the second encoder if necessaryconsidering the adjustable period of time by which the current sectionof the first encoder and the current section of the second encoder aretime-shifted in relation to each other.
 9. Device for decoding ascalable data stream from at least one block of output data of a firstencoder and at least one block of output data of a second encoder,wherein the second encoder includes a bit savings bank which is definedby a maximum size and a current level, wherein the at least one block ofoutput data of the first encoder illustrates a number of samples of theinput signal into the first encoder, wherein the number of samplesdefine a current section of the input signal for the first encoder andwherein the at least one block of output data of the second encoderillustrates a number of samples of the input signal into the secondencoder, wherein the number of samples illustrate a current section ofthe input signal for the second encoder, wherein the number of samplesfor the first encoder and the number of samples for the second encoderare equal and wherein the current sections for the first and the secondencoder are identical or shifted in relation to each other by anadjustable period of time, wherein the scalable data tream comprisesoutput data of the first encoder, output data of the second encoder fora preceding section, output data of the second encoder for a currentsection, a determining data block and buffer information, comprising:means for buffering the scalable data stream; means for reading theblock of output data of the first encoder for the current section of thefirst encoder; means for reading the determining data block and thebuffer information from the buffered data stream; means for determiningthe beginning of the block of output data of the second encoder for thecurrent section of the second encoder using the buffer information; andmeans for decoding the block of output data of the first encoder and theblock of output data of the second encoder if necessary considering theadjustable period of time by which the current section of the firstencoder and the current section of the second encoder are time-shiftedto each other.